NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Defending against diverse attacks in federated learning through consensus-based bi-level optimization

https://doi.org/10.1098/rsta.2024.0235

García_Trillos, Nicolás; Kumar_Akash, Aditya; Li, Sixu; Riedl, Konstantin; Zhu, Yuhua (June 2025, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences)

Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB²O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB²O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB²O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB²O to the clustered federated learning setting by proposing FedCB²O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB²O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts. This article is part of the theme issue ‘Partial differential equations in data science’.
more » « less
Free, publicly-accessible full text available June 5, 2026
On the existence of solutions to adversarial training in multiclass classification

https://doi.org/10.1017/S0956792524000822

García_Trillos, Nicolás; Jacobs, Matt; Kim, Jakwang (December 2024, European Journal of Applied Mathematics)

Abstract Adversarial training is a min-max optimization problem that is designed to construct robust classifiers against adversarial perturbations of data. We study three models of adversarial training in the multiclass agnostic-classifier setting. We prove the existence of Borel measurable robust classifiers in each model and provide a unified perspective of the adversarial training problem, expanding the connections with optimal transport initiated by the authors in their previous work [21]. In addition, we develop new connections between adversarial training in the multiclass setting and total variation regularization. As a corollary of our results, we provide an alternative proof of the existence of Borel measurable solutions to the agnostic adversarial training problem in the binary classification setting.
more » « less
Full Text Available
A new perspective on denoising based on optimal transport

https://doi.org/10.1093/imaiai/iaae029

García_Trillos, Nicolás; Sen, Bodhisattva (September 2024, Information and Inference: A Journal of the IMA)

Abstract In the standard formulation of the classical denoising problem, one is given a probabilistic model relating a latent variable $$\varTheta \in \varOmega \subset{\mathbb{R}}^{m} \; (m\ge 1)$$ and an observation $$Z \in{\mathbb{R}}^{d}$$ according to $$Z \mid \varTheta \sim p(\cdot \mid \varTheta )$$ and $$\varTheta \sim G^{*}$$, and the goal is to construct a map to recover the latent variable from the observation. The posterior mean, a natural candidate for estimating $$\varTheta $$ from $$Z$$, attains the minimum Bayes risk (under the squared error loss) but at the expense of over-shrinking the $$Z$$, and in general may fail to capture the geometric features of the prior distribution $$G^{*}$$ (e.g. low dimensionality, discreteness, sparsity). To rectify these drawbacks, in this paper we take a new perspective on this denoising problem that is inspired by optimal transport (OT) theory and use it to study a different, OT-based, denoiser at the population level setting. We rigorously prove that, under general assumptions on the model, this OT-based denoiser is mathematically well-defined and unique, and is closely connected to the solution to a Monge OT problem. We then prove that, under appropriate identifiability assumptions on the model, the OT-based denoiser can be recovered solely from information of the marginal distribution of $$Z$$ and the posterior mean of the model, after solving a linear relaxation problem over a suitable space of couplings that is reminiscent of standard multimarginal OT problems. In particular, due to Tweedie’s formula, when the likelihood model $$\{ p(\cdot \mid \theta ) \}_{\theta \in \varOmega }$$ is an exponential family of distributions, the OT-based denoiser can be recovered solely from the marginal distribution of $$Z$$. In general, our family of OT-like relaxations is of interest in its own right and for the denoising problem suggests alternative numerical methods inspired by the rich literature on computational OT.
more » « less
Full Text Available
On adversarial robustness and the use of Wasserstein ascent-descent dynamics to enforce it

https://doi.org/10.1093/imaiai/iaae018

García_Trillos, Camilo Andrés; García_Trillos, Nicolás (July 2024, Information and Inference: A Journal of the IMA)

Abstract We propose iterative algorithms to solve adversarial training problems in a variety of supervised learning settings of interest. Our algorithms, which can be interpreted as suitable ascent-descent dynamics in Wasserstein spaces, take the form of a system of interacting particles. These interacting particle dynamics are shown to converge toward appropriate mean-field limit equations in certain large number of particles regimes. In turn, we prove that, under certain regularity assumptions, these mean-field equations converge, in the large time limit, toward approximate Nash equilibria of the original adversarial learning problems. We present results for non-convex non-concave settings, as well as for non-convex concave ones. Numerical experiments illustrate our results.
more » « less
Full Text Available
Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

García_Trillos, Nicolás; Little, Anna; McKenzie, Daniel; Murphy, James (June 2024, Journal of machine learning research)

We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, where they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the eﬃcacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective aﬀorded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into eﬃcient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.
more » « less
Full Text Available

Search for: All records